Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQLite client and converter #424

Merged
merged 3 commits into from
Feb 5, 2024
Merged

Add SQLite client and converter #424

merged 3 commits into from
Feb 5, 2024

Conversation

criccomini
Copy link
Contributor

@criccomini criccomini commented Feb 1, 2024

Recap can now read SQLite schemas as Recap types. SQLite's schema system is
somewhat strange. Some notes:

  1. Any column can store any type.
  2. SQLite has 5 storage classes (null, int, real, text, blob).
  3. STRICT forces column types to be the storage types.
  4. non-STRICT tables allow any strings for column types.
  5. non-STRICT column types are hints to coerce types as they're written to disk.
  6. Parenthesis in types (e.g. DOUBLE(6, 2)) are ignored by SQLite.

See https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes for more
details.

With all of this in mind, Recap's SQLiteConverter works according to SQLite's
affinity rules. This means:

  1. Unknown types are treated as "ANY", which is a union of all storage types.
  2. SQLiteConverter pays attention to precision/scale for REAL, etc.
  3. SQLiteConverter pays attention to lengths for VARCHAR(255), etc.
  4. SQLiteConverter treats date, datetime, time, and timestamp as ANY types.

Closes #418

There was a bug in `make_nullable` that caused UnionTypes to get double nested.
I've updated the code to keep the UnionType flat. The code now adds a NullType
if one isn't in the `types` attribute. It also sets `default` appropriate.
@criccomini criccomini changed the title Sqlite Add SQLite client and converter Feb 1, 2024
@mjperrone
Copy link
Contributor

non-STRICT column types are hints to coerce types as they're written to disk.

That's interesting. That means the types are different on write vs read. You can write a "5" to a table but then get a 5 back on query. I haven't thought through all the implications of that yet, but seems worth pondering.

This reminds me of Kysely's Generated label. Kysely is a type safe query builder for typescript. It lets you define a typescript interface representing a table, then you can use it to have type safety on insert and select. For columns with default values (as in you wrote null but get a value out of it) it offers Generated, and the TS type on insert is optional, but on select it's required.

@criccomini
Copy link
Contributor Author

Yea, gotta say I'm not a fan of SQLite's type system. And yes, you're correct, you can write "5" and get back 5. And this coercion is based on the type of the column, which can be anything! You can make the type NOT AN INTEGER, then write "5", and get back 5. 😭

@criccomini
Copy link
Contributor Author

That said, for the happy path, I think my converter should work just fine. The only caveat is date types are treated as "any" types. This is actually how SQLite handles it. I'm open to extending the converter to handle dates in a custom way, but would rather keep that as a separate PR.

Recap can now read SQLite schemas as Recap types. SQLite's schema system is
somewhat strange. Some notes:

1. Any column can store any type.
2. SQLite has 5 storage classes (null, int, real, text, blob).
3. STRICT forces column types to be the storage types.
4. non-STRICT tables allow any strings for column types.
5. non-STRICT column types are hints to coerce types as they're written to disk.
6. Parenthesis in types (e.g. DOUBLE(6, 2)) are ignored by SQLite.

See https://www.sqlite.org/datatype3.html#storage_classes_and_datatypes for more
details.

With all of this in mind, Recap's SQLiteConverter works according to SQLite's
affinity rules. This means:

1. Unknown types are treated as "ANY", which is a union of all storage types.
2. SQLiteConverter pays attention to precision/scale for REAL, etc.
3. SQLiteConverter pays attention to lengths for VARCHAR(255), etc.
4. SQLiteConverter treats date, datetime, time, and timestamp as ANY types.

Closes #418
@criccomini
Copy link
Contributor Author

@mjperrone This is ready for a review. 😄

Copy link
Contributor

@mjperrone mjperrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! BTW the only info I got about potential use cases is that it would be v3.


def ls(self) -> list[str]:
cursor = self.connection.cursor()
cursor.execute("SELECT name FROM sqlite_schema WHERE type='table'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine also wanting to capture schema of views. Doesn't seem necessary for the first go at this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, good point. I'll open a follow-on GH issue for that. I hope it's as easy as type in ('table', 'view') 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 94 to 95
row = self.add_information_schema(row)
row = self.add_information_schema(row)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this duplicated?

Suggested change
row = self.add_information_schema(row)
row = self.add_information_schema(row)
row = self.add_information_schema(row)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I had two different functions but merged them into one i_s function..

rows = []

for row_cells in cursor.fetchall():
row = dict(zip(names, row_cells))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3 python

}

# Extract precision, scale, and octet length.
numeric_pattern = re_compile(r"(\w+)\((\d+)(?:,\s*(\d+))?\)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPT helped me understand this matches strings that look like function calls. Might be helpful to point to some documentation on what TYPE can look plike

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added docs.

cursor.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name=?", (table,)
)
return bool(cursor.fetchone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confuses me since above I saw row[0] for row in cursor.fetchall(). If there is only one element SELECTed, is a tuple returned for that row, or just the value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content isn't as important as whether the row exists or not. I'm doing the _table_exists check to prevent SQL injection attacks since you can't %s or ? parameterize pragma calls. The SELECT is getting all rows that exactly match the table string. If at least one row exists, then we assume the table exists.

The row[0] for row in cursor.fetchall() is for listing all tables in the database. But it's not guaranteed that a user will call schema() with a table from the ls() command, so I wanted to guard against injection.

name TEXT NOT NULL,
short_name CHAR(20) NOT NULL,
variable_name VARCHAR(255) NOT NULL,
descriptive_text CLOB NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLOB is a weird abbreviation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. And yet it's a thing. 😛

Python's `urlunparse` has this annoying behavior where it doesn't include double
slashes if the netloc param is empty. This is annoying with URLs like
`sqlite:///foo/bar/baz`. The parser would return `sqlite:/foo/bar/baz` in this
scenario. I've updated the safe/unsafe methods in `RecapSettings` to account for
this.
@criccomini criccomini merged commit 0ea697d into main Feb 5, 2024
3 checks passed
@criccomini criccomini deleted the sqlite branch February 5, 2024 22:10
@criccomini
Copy link
Contributor Author

Merged! It's now available in 0.11.0! 😄

https://pypi.org/project/recap-core/

@mjperrone
Copy link
Contributor

Awesome! You also might want to update the homepage compatibility table to include this 🥳

@criccomini
Copy link
Contributor Author

On the list for tomorrow! I want to add docs to the website first.

@criccomini
Copy link
Contributor Author

@criccomini
Copy link
Contributor Author

criccomini commented Feb 29, 2024

Released in 0.12.0:

https://pypi.org/project/recap-core/0.12.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support SQLite converter into Recap
2 participants